{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Help Desk - Easy\n",
    "\n",
    "## Scenario\n",
    "A software company has been successful in selling its products to a number of customer organisations, and there is now a high demand for technical support. There is already a system in place for logging support calls taken over the telephone and assigning them to engineers, but it is based on a series of spreadsheets. With the growing volume of data, using the spreadsheet system is becoming slow, and there is a significant risk that errors will be made.\n",
    "\n",
    "![rel](https://sqlzoo.net/w/images/3/38/Helpdesk.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Setting default log level to \"WARN\".\n",
      "To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n"
     ]
    }
   ],
   "source": [
    "import findspark\n",
    "import pandas as pd\n",
    "findspark.init()\n",
    "\n",
    "SVR = '192.168.31.31'\n",
    "from pyspark.sql import SparkSession\n",
    "from pyspark.sql.functions import *\n",
    "\n",
    "sc = (SparkSession.builder.appName('app12-1') \n",
    "      .master(f'spark://{SVR}:7077') \n",
    "      .config('spark.sql.warehouse.dir', f'hdfs://{SVR}:9000/user/hive/warehouse') \n",
    "      .config('spark.cores.max', '4') \n",
    "      .config('spark.executor.instances', '1') \n",
    "      .config('spark.executor.cores', '2') \n",
    "      .config('spark.executor.memory', '10g') \n",
    "      .enableHiveSupport().getOrCreate())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "shift = sc.read.table('sqlzoo.Shift')\n",
    "staff = sc.read.table('sqlzoo.Staff')\n",
    "issue = sc.read.table('sqlzoo.Issue')\n",
    "shift_type = sc.read.table('sqlzoo.Shift_type')\n",
    "level = sc.read.table('sqlzoo.Level')\n",
    "customer = sc.read.table('sqlzoo.Customer')\n",
    "caller = sc.read.table('sqlzoo.Caller')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.\n",
    "There are three issues that include the words \"index\" and \"Oracle\". Find the call_date for each of them\n",
    "\n",
    "```\n",
    "+---------------------+----------+\n",
    "| call_date           | call_ref |\n",
    "+---------------------+----------+\n",
    "| 2017-08-12 16:00:00 |     1308 |\n",
    "| 2017-08-16 14:54:00 |     1697 |\n",
    "| 2017-08-16 19:12:00 |     1731 |\n",
    "+---------------------+----------+\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "                                                                                \r"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Call_date</th>\n",
       "      <th>Call_ref</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-08-12 16:00:00.0</td>\n",
       "      <td>1308</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-08-16 14:54:00.0</td>\n",
       "      <td>1697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-08-16 19:12:00.0</td>\n",
       "      <td>1731</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Call_date  Call_ref\n",
       "0  2017-08-12 16:00:00.0      1308\n",
       "1  2017-08-16 14:54:00.0      1697\n",
       "2  2017-08-16 19:12:00.0      1731"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(issue.filter((col('Detail').contains('index')) & \n",
    "              (col('Detail').contains('Oracle')))\n",
    " .select('Call_date', 'Call_ref')\n",
    " .toPandas())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.\n",
    "Samantha Hall made three calls on 2017-08-14. Show the date and time for each\n",
    "\n",
    "```\n",
    "+---------------------+------------+-----------+\n",
    "| call_date           | first_name | last_name |\n",
    "+---------------------+------------+-----------+\n",
    "| 2017-08-14 10:10:00 | Samantha   | Hall      |\n",
    "| 2017-08-14 10:49:00 | Samantha   | Hall      |\n",
    "| 2017-08-14 18:18:00 | Samantha   | Hall      |\n",
    "+---------------------+------------+-----------+\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Call_date</th>\n",
       "      <th>First_name</th>\n",
       "      <th>Last_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-08-14 10:10:00.0</td>\n",
       "      <td>Samantha</td>\n",
       "      <td>Hall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-08-14 10:49:00.0</td>\n",
       "      <td>Samantha</td>\n",
       "      <td>Hall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-08-14 18:18:00.0</td>\n",
       "      <td>Samantha</td>\n",
       "      <td>Hall</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Call_date First_name Last_name\n",
       "0  2017-08-14 10:10:00.0   Samantha      Hall\n",
       "1  2017-08-14 10:49:00.0   Samantha      Hall\n",
       "2  2017-08-14 18:18:00.0   Samantha      Hall"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(issue.filter((to_date(col('Call_date'))=='2017-08-14'))\n",
    " .join(caller.filter((col('First_name')=='Samantha') & \n",
    "                     (col('Last_name')=='Hall')), on='Caller_id')\n",
    " .select('Call_date', 'First_name', 'Last_name')\n",
    " .toPandas())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.\n",
    "There are 500 calls in the system (roughly). Write a query that shows the number that have each status.\n",
    "\n",
    "```\n",
    "+--------+--------+\n",
    "| status | Volume |\n",
    "+--------+--------+\n",
    "| Closed |    486 |\n",
    "| Open   |     10 |\n",
    "+--------+--------+\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Status</th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Open</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Closed</td>\n",
       "      <td>486</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Status  Volume\n",
       "0    Open      10\n",
       "1  Closed     486"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(issue.groupBy('Status')\n",
    " .agg(count('Call_ref').alias('Volume'))\n",
    " .toPandas())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.\n",
    "Calls are not normally assigned to a manager but it does happen. How many calls have been assigned to staff who are at Manager Level?\n",
    "\n",
    "```\n",
    "+------+\n",
    "| mlcc |\n",
    "+------+\n",
    "|   51 |\n",
    "+------+\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mlcc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mlcc\n",
       "0    51"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(issue.join(staff, on=(issue['Assigned_to']==staff['Staff_code']))\n",
    " .join(level.filter(col('Manager')=='Y'), 'Level_code')\n",
    " .agg(count('Call_ref').alias('mlcc'))\n",
    " .toPandas())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.\n",
    "Show the manager for each shift. Your output should include the shift date and type; also the first and last name of the manager.\n",
    "\n",
    "```\n",
    "+------------+------------+------------+-----------+\n",
    "| Shift_date | Shift_type | first_name | last_name |\n",
    "+------------+------------+------------+-----------+\n",
    "| 2017-08-12 | Early      | Logan      | Butler    |\n",
    "| 2017-08-12 | Late       | Ava        | Ellis     |\n",
    "| 2017-08-13 | Early      | Ava        | Ellis     |\n",
    "| 2017-08-13 | Late       | Ava        | Ellis     |\n",
    "| 2017-08-14 | Early      | Logan      | Butler    |\n",
    "| 2017-08-14 | Late       | Logan      | Butler    |\n",
    "| 2017-08-15 | Early      | Logan      | Butler    |\n",
    "| 2017-08-15 | Late       | Logan      | Butler    |\n",
    "| 2017-08-16 | Early      | Logan      | Butler    |\n",
    "| 2017-08-16 | Late       | Logan      | Butler    |\n",
    "+------------+------------+------------+-----------+\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>shift_date</th>\n",
       "      <th>Shift_type</th>\n",
       "      <th>First_name</th>\n",
       "      <th>Last_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-08-12</td>\n",
       "      <td>Early</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-08-12</td>\n",
       "      <td>Late</td>\n",
       "      <td>Ava</td>\n",
       "      <td>Ellis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-08-13</td>\n",
       "      <td>Early</td>\n",
       "      <td>Ava</td>\n",
       "      <td>Ellis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017-08-13</td>\n",
       "      <td>Late</td>\n",
       "      <td>Ava</td>\n",
       "      <td>Ellis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017-08-14</td>\n",
       "      <td>Early</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2017-08-14</td>\n",
       "      <td>Late</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2017-08-15</td>\n",
       "      <td>Early</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2017-08-15</td>\n",
       "      <td>Late</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2017-08-16</td>\n",
       "      <td>Early</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2017-08-16</td>\n",
       "      <td>Late</td>\n",
       "      <td>Logan</td>\n",
       "      <td>Butler</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   shift_date Shift_type First_name Last_name\n",
       "0  2017-08-12      Early      Logan    Butler\n",
       "1  2017-08-12       Late        Ava     Ellis\n",
       "2  2017-08-13      Early        Ava     Ellis\n",
       "3  2017-08-13       Late        Ava     Ellis\n",
       "4  2017-08-14      Early      Logan    Butler\n",
       "5  2017-08-14       Late      Logan    Butler\n",
       "6  2017-08-15      Early      Logan    Butler\n",
       "7  2017-08-15       Late      Logan    Butler\n",
       "8  2017-08-16      Early      Logan    Butler\n",
       "9  2017-08-16       Late      Logan    Butler"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(shift.withColumn('shift_date', to_date(col('Shift_date')))\n",
    " .join(staff, on=(shift['Manager']==staff['Staff_code']))\n",
    " .select('shift_date', 'Shift_type', 'First_name', 'Last_name')\n",
    " .dropDuplicates()\n",
    " .orderBy('shift_date', 'Shift_type')\n",
    " .toPandas())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "sc.stop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
